* MASTER DO FILE & README
********************************************************************************
********************************************************************************

* SET PATHS
********************************************************************************
global path "~/IFS Dropbox/Alison Andrew/Early Marriage and Willingness to Pay/Replication"

global data "$path/Data"
global output "$path/Output"
global cf_choiceprobs "$path/Output/counterfactuals"
global bs_choiceprobs "$path/Output/bs_results/choiceprobs_bootstraps"

* INSTALL PACKAGES
ssc install unique
ssc install cibar
ssc install schemepack
net install grc1leg, from(http://www.stata.com/users/vwiggins)

* PART 0) PREPARE DATA
********************************************************************************

* 0a) Clean 5-year trajectory data for the observed sample. 
*	We will use this data later to compare against our predicted trajectories 
*	and to assess heterogeneity by actual choice.
*-------------------------------------------------------------------------------
do "$path/0_prep_data/0a_process_5year_trajectory_data"

* 0b) Generate weights that we will use to construct bootstrapped standard errors. 
*	We use the fractional (Bayesian) bootstrap and draw weights using the 
*	dirichlet distribution. 
*-------------------------------------------------------------------------------
do "$path/0_prep_data/0b_generate bootstrap weights"


* PART 1) REDUCED FORM ANALYSIS (IN STATA)
********************************************************************************

* 1a) Generate sample descriptive tables: 
			* Table 1: Sample Summary Statistics
			* Table A1: Balance by whether respondent was assigned to 
			* 			ex-post vs. ex-ante survey instrument
*-------------------------------------------------------------------------------
do "$path/1_reduced_form/1a_sample_descriptive_table"

* 1b) Generate descriptives of ex-post and ex-ante choice data:
			* Figure 2: Ex-Post and Ex-Ante Descriptives
			* Table A.11: Testing the Impact of Vignette Salience on Response Patterns
*-------------------------------------------------------------------------------
do "$path/1_reduced_form/1b_response_patterns_descriptives"

* 1c) Generate descriptives of direct expectations elicitation:
			* Figure A.9: Expected Match Has a Government Job by Girl's Education
			* Table A.6: Determinants of Expected Match
*-------------------------------------------------------------------------------
do "$path/1_reduced_form/1c_direct_expectations_descriptives"

* 1d) Generate descriptives of groom's side preference experiment:
			* Table A.5: Determinants of Groom Choices: Reduced Form Probit
*-------------------------------------------------------------------------------
do "$path/1_reduced_form/1d_groom_side_descriptives"


* PART 2) STRUCTURAL ESTIMATION AND ANALYSIS (IN MATLAB)
********************************************************************************

/*% Note on reproducability of MATLAB section. 
% It is a known issue that MATLAB optimization routines will not always reach the 
% exact same solutions between different systems with different OS versions, 
% CPUs, BIOS settings (see
% https://uk.mathworks.com/matlabcentral/answers/2144809-why-do-i-receive-different-results-from-two-separate-runs-of-the-same-matlab-operation)

% In practice we have found that our preference estimates, and bootstraps
% thereof, created in sections 1 and 2 of this file differ between machines
% beginning around the 6th decimal place. As such, these slight differences
% never affect the preference results to the level of precision as reported
% in the paper. We have found slightly larger differences between machines
% in the bootstraps of the belief estimates. In our runs on different 
% machines, we have found differences in estimated standard errors in the
% belief estimates occasionally show up at our 3rd significant figure (but
% generally are confined to later figures that do not affect the results to
% the precision reported in the paper). We haven't found any sensitivity of
% the main estimates between machines. Digging further, we have found that
% the main cause of these differences in the bootstrapped belief estimates 
% is the (very small) differences in the preference estimates which then 
% get passed to the belief estimation. 

% All analysis presented in this paper was created on a Dell laptop with the
% following characteristics: 

    % OS Name	Microsoft Windows 11 Pro
    % Version	10.0.22621 Build 22621
    % OS Manufacturer	Microsoft Corporation
    % System Manufacturer	Dell Inc.
    % System Model	XPS 15 9520
    % System Type	x64-based PC
    % System SKU	0B19
    % Processor	12th Gen Intel(R) Core(TM) i9-12900HK, 2500 Mhz, 14 Core(s), 20 Logical Processor(s)
    % BIOS Version/Date	Dell Inc. 1.29.0, 11/12/2024

% In our replication package, we include all the MATLAB output generated in this
% analysis in the "Output" folder. In particular "Output/bs_results"
% contains the preference and belief parameter estimates at every bootstrap. 
% Files in this replication package write to the "Output_New" folder which 
% is empty. When preference estimates are read back in to estimate beliefs,
% they are read in from the "Output" folder (i.e. the original estimates 
% used in the paper". This could be altered to the "Output_New" folder if
% the user wants to use newly-generated estimates.
*/

* Run the following scripts in order in Matlab: 

* $path/2_matlab_estimation/f2a_main_analysis
*-------------------------------------------------------------------------------
/*
Script performs main preference and belief estimation and produces the following outputs: 
	* Table 2: Structural Preference Parameters
	* Figure 3: Preferences over a Daughter's Age and Education at Marriage
	* Table 3: Structural Belief Parameters
	* Figure 4: Probability of high quality marriage offer
	* Figure A.3: Histogram of groom quality

File also generates the choice probabilities (and boostraps thereof) that are 
used in stata to generate our predicted trajectories (Figure 5, Figure 6) 
*/

* $path/2_matlab_estimation/f2b_heterogeneity
*-------------------------------------------------------------------------------
/*
Script performs all analysis of heterogeneity in preferences and beliefs and 
generates the following outputs: 
	* Table 4
	* Tables A7-A10
*/

* $path/2_matlab_estimation/f2c_robustness_prefs
*-------------------------------------------------------------------------------
/*
Script analyses robustness of preference estimates to not allowing for inattention and produces: 
	* Table A3
*/

* $path/2_matlab_estimation/f2d_robustness_beliefs
*-------------------------------------------------------------------------------
/*
Script analyses robustness of belief estimates to 7 different modifications of 
assumptions and generates:
	* Table A4
*/

* $path/2_matlab_estimation/f2e_counterfactuals
*-------------------------------------------------------------------------------
/*
Script generates model predictions under various counterfactual preferences
% and beliefs. These underly profiles:
	* Figure 7
	* Figure A11
	* Figure A12
*/

* PART 3) VALIDATION AND COUNTERFACTUAL ANALYSIS
********************************************************************************

* 3a) Create descriptives of trajectories observed in the observational data (i.e. proportion of girls in school and married at different ages) and standard errors therein
*-------------------------------------------------------------------------------
do "$path/3_validation_and_counterfactuals/3a_clean_and_collapse_observational_data"

* 3b) Compare model trajectories with observed (unconditionally and conditional on shifters of preferences and beliefs). Code generates: 
	* Figure 5: Predicted vs. Observed Trajectories
	* Figure 6: Predicted vs. Observed heterogeneity: Preference and Belief Shifters
	* Figure A4: Implied vs. Observed heterogeneity in marriage by shifters of preferences and beliefs
	* Figure A5: Formal comparison between model predictions and observed patterns
	* Figure A6: Formal comparison between model predictions and observed patterns for schooling by shifters of preferences and beliefs
	* Figure A7: Formal comparison between model predictions and observed patterns for marriage by shifters of preferences and beliefs
*-------------------------------------------------------------------------------
do "$path/3_validation_and_counterfactuals/3b_compare_model_trajectories_with_observed"

* 3c) Generate trajectories of marriage and schooling patterns under various counterfactuals. Generates: 
	* Figure 7: Counterfactual schooling profiles
	* Figure A11: Counterfactual marriage profiles
	* Figure A12: Impact of shocks on hazard rate of marriage by age
*-------------------------------------------------------------------------------
do "$path/3_validation_and_counterfactuals/3c_counterfactuals"
